Learning to Extract Attribute Values from a Search Engine with Few Examples
نویسندگان
چکیده
We propose an attribute value extraction method based on analysing snippets from a search engine. First, a pattern based detector is applied to locate the candidate attribute values in snippets. Then a classifier is used to predict whether a candidate value is correct. To train such a classifier, only very few annotated triples are needed, and sufficient training data can be generated automatically by matching these triples back to snippets and titles. Finally, as a correct value may appear in multiple snippets, to exploit such redundant information, all the individual predictions are assembled together by voting. Experiments on both Chinese and English corpora in the celebrity domain demonstrate the effectiveness of our method: with only 15 annotated triples, 7 of 12 attributes’ precisions are over 85%; Compared to a state-of-the-art method, 11 of 12 attributes have improvements.
منابع مشابه
Learning cross-level certain and possible rules by rough sets
Machine learning can extract desired knowledge and ease the development bottleneck in building expert systems. Among the proposed approaches, deriving rules from training examples is the most common. Given a set of examples, a learning program tries to induce rules that describe each class. Recently, the rough-set theory has been widely used in dealing with data classification problems. Most of...
متن کاملA Machine Learning Approach towards Improving Internet Search with a Question-Answering System
This paper introduces a prototype to extract common sense knowledge from the World Wide Web. The prototype combines a search engine with an automated database. It works by extracting information from the enormous amount of documents available on the World Wide Web. Two common examples are that men love women and that women love men (bi-directional relationship) or that boys like toys (unidirect...
متن کاملLightly-Supervised Attribute Extraction
Web search engines can greatly benefit from knowledge about attributes of entities present in search queries. In this paper, we introduce lightly-supervised methods for extracting entity attributes from natural language text. Using these methods, we are able to extract large numbers of attributes of different entities at fairly high precision from a large natural language corpus. We compare our...
متن کاملDEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web
The web is a rich resource of structured data. There has been an increasing interest in using web structured data for many applications such as data integration, web search and question answering. In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them. Since product specifications exist in multiple product sites, our ...
متن کاملMining from incomplete quantitative data by fuzzy rough sets
Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013